Exact solution of bond percolation on small arbitrary graphs 
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We introduce a set of iterative equations that exactly solves the size distribution of components on 
small arbitrary graphs after the random removal of edges. We also demonstrate how these equations 
can be used to predict the distribution of the node partitions (i.e., the constrained distribution of the 
size of each component) in undirected graphs. Besides opening the way to the theoretical prediction 
of percolation on arbitrary graphs of large but finite size, we show how our results find application 
in graph theory, epidemiology, percolation and fragmentation theory. 
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Introduction. — Percolation on graphs is the study of 
the behavior of components in graphs whose nodes/edges 
are removed randomly. It has received a lot of attention 
recently for its various applications in many disciplines. 
Among them, let us mention the modeling of epidemic 
propagation [IHl] where the size distribution of compo- 
nents corresponds to the outbreak size distribution. The 
same distribution is also related to the size and compo- 
sition of fragments in nuclear multifragmentation [SHS]. 
Finally, the study of the percolation threshold allows an 
assessment of the robustness (or reliability) of real net- 
works to the failure of their nodes or edges [SHU] • Along- 
side the intrinsic theoretical interest, this type of stud- 
ies has triggered the development of increasingly realistic 
and complex models (see |15H17| , and references therein) . 

In the quest for ever more realistic models, a promis- 
ing idea is to consider real networks not at the level of 
the nodes, but at higher levels of organization such as 
motifs, subgraphs or communities. It has recently been 
proposed that this perspective could help unify and ex- 
plain many of the universal properties found in real net- 
works [18]. From the modeling perspective, using motifs 
as the fundamental building blocks of graphs has allowed 
to relax some of the limiting assumptions behind exist- 
ing bond percolation models pUH^ . This has effectively 
extended the class of models for which exact results can 
be obtained. 

The advantages gained come at a price however: one 
needs to solve beforehand the distribution of the size of 
components in these motifs. While this can be done 
systematically for Erdos-Renyi subgraphs (or cliques) 
[m (22 , the general problem must be solved by hand on 
a case-by-case basis. Hence, one is either limited to very 
small graphs or one has to rely on numerical simulations. 

In this Letter, we introduce a set of iterative equations 
that exactly compute the size distribution of components 
in a multitype version of Erdos-Renyi graphs. In the 
case where a different type is assigned to each node of 
a graph, the equations produce this distribution for any 
small arbitrary graphs, defined by an asymmetric non- 
negative adjacency matrix, after the random removal of 



its edges. These equations therefore provide a systematic 
way to compute the required distributions in the motif- 
based bond percolation models mentioned previously. 

We further show how these equations naturally offer a 
method to count the number of labeled multitype graphs 
with a given number of nodes and edges. We also explain 
how they can be used to study bond percolation on pe- 
riodic infinite lattices. We finally demonstrate how these 
equations allow the exact calculation of the constrained 
distribution of the size of each component, or the parti- 
tion of nodes, in undirected graphs. It is suggested that 
this provides a null model for fragmentation processes. 

Percolation on multitype random graphs. — Let us 
consider a multitype generalization of the random graph 
model Qn,p [H]. These graphs are composed of n nodes 
and an edge exists between any two nodes with probabil- 
ity p regardless of other potential edges. We generalize 
this model by labelling nodes using M types such that a 
graph of size n = {ui, . . . , hm)^ is composed of Ui type-i 
nodes (with i = 1, . . . , M). Directed edges from type-z 
nodes to type-j nodes (noted i — > j) exist with proba- 
bility pij independently of one another, pij need not be 
equal to pji since edges are directed. Results statistically 
equivalent to undirected graphs — where an undirected 
edge exists with a given probability — are obtained in 
the symmetric case {pij = Pji). While the usefulness 
of this multitype generalization will become manifest in 
the next section, multitype Erdos-Renyi graphs can be 
used as a first approximation of structures in which cor- 
relations exist in the way nodes are connected (with an 
appropriate choice of Pij). 

We define Qi{l\n) as the probability that the com- 
ponent reached from a type-i node contains I = 
{li, . . . ,Im)~^ nodes (including the initial type-i node) 
given that the graph contains n nodes. Because of the 
presence of directed edges, we extend the definition of 
a component to all nodes accessible from a given node. 
Thus, node A being in the component reached from node 
B does not imply that the converse is true. In the same 
spirit of [14| I22j , we now derive two recurrence equations 
allowing for the explicit calculation of Qi{l\n). 
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FIG. 1. (color online) (a) Example of an arbitrary graph of 20 nodes. Although not shown here, graphs with multiple edges 
could as easily be considered by our method, (b) Distribution of the size of the component reached from a randomly chosen 
node in the graph shown in (a) where edges have been removed with probability 1 — T. Lines were obtained using eqs. ([2|-(j4]|, 
and symbols were obtained by performing 10^ simulations. The distributions are discrete; lines have been added to guide the 
eye. 



Let us consider a graph of size n, a component of size Z, 
and an initial node of type i. We note 5ij the Kronecker 
delta. Among the rij — Sij nodes of type j (excluding the 
initial node of type i), there are (Ji^Zg'^.) ways to choose 
the Ij type-j nodes that are part of the component. These 
Ij type-j nodes will not lead to any of the remaining 
n — I nodes that are not part of the component with 

probabihty life 'Zjfc'""'' '''''' where qjk = 1 — Pjk- In other 
words, this is the probability that no directed edge j ^ k 
exists between the Ij type-j nodes in the component and 
the n — I nodes of every other types excluded from it. 
Repeating this procedure for the other types of nodes in 
the component, together with the observation that the 
I nodes form a component with probability Qi{l\l), we 
have that 

jk ^ 

That is, by knowing the probability of finding a compo- 
nent of size i in a graph of size I, eq. ([!]) computes the 
probability of finding a component of size Z in a graph 
of size n (with rij > Ij for all j). Finally, to obtain 
Qi{l\l), we note that the distribution {Qi(rn\V)} must be 
normalized for a given graph size Z, hence 

g,(Z|Z) = 1 - ^ Q^{m\l) . (2) 

m<_l 

The sum covers every possible instances of m such that 
m,j < Ij for all j but excludes the case where rrij ~ Ij for 
each j. Starting with the initial condition Qi{8i\8i) = 1, 
where 5i = (5ii, . . . , (Jim)^, we can therefore calculate 
every coefRcient Qi{l\ri) using eqs. ([l])"([2| iteratively. In 



other words, from a graph made out of a single node, 
eqs. ([T])-([2]) extend the graph to the desired size, and keep 
track of the component size distribution along the way 
to build the final distribution Qi{l\n). A simple example 
of such a calculation is given in the Appendix. Setting 
M = 1, we retrieve the recurrence equations presented 
in [131 [55]. Also, a similar approach has been used to 
analyse the reliablity of communication networks jl3j . 

Using the identity 1 = Ilj^fefe + qjuT'^""-'^"-^ in 
eq. ([2]), where nj{nk — Sjk) is the maximal number of 
j ^ k edges in the graph, the iteration of eqs. ([T])-([2]) 
yields polynomials whose coefficients have a direct com- 
binatorial interpretation. Indeed, the coefficient in front 

IljkP'jk''ljk' is simply the number of dis- 

tinct ways to reach a component of size Z in a graph of 
size n from a type-« node using Ujk existing and bjk ab- 
sent j ^ k edges, respectively. The sum ajk + bj^ may 
be different than nj{nk — Sjk) as the existence of some 
edges may be irrelevant to the component. Of particuliar 
interest is the symmetric case where pjk = Pkj for all j 
and k for which Qi{n\'n) is independent of i and whose 
coefficients are the number of connected labeled graphs 
of size n. Hence, we see that eqs. ([l])-([2| offer an alter- 
native method to enumerate the number of graphs with 
a given number of edges and labeled nodes of different 
types. 

Percolation on arbitrary graphs. — By considering 
that each node of an arbitrary graph belongs to its own 
type, eqs. ([l])-(|2]) can exactly predict the outcome of a 
bond percolation process that has occurred on it. Pre- 
dicting the outcome here is as precise as knowing the 
identity of the nodes that have been reached and of the 
ones that have not. To illustrate this point, let us con- 
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sider the simplest case where an edge is to be kept with 
the same probabihty T during the percolation process 
[23] , The probability pjk for the edge j k to exist 
then becomes Pjk — 1 — (1 — r)"^^'' , where A^-fc G N is an 
element of the adjacency matrix A corresponding to the 
number of directed edges from node j to node k. Arbi- 
trary graphs with directed edges and multiple edges can 
thus be considered with our method. 

Because each node belongs to its own type, the ele- 
ments of the vectors Z, m and n indicate whether each 
node is present (value 1) or not (value 0). This allows us 
to write eq. ([T]) in a simpler and more compact form 



Q,{l\n)^Q:[l\m-T) 



I'Al 



(3) 



where the elements of I are defined such that Ij + Ij = rij 
for all j. That is, l^Al is the number of outgoing edges 
that must not exist for the component of size I to be 
isolated from the rest of the graph. Using eqs. ([2|-([3| 
together with the initial condition Qi{Si\6i) = 1, we can 
calculate the exact probability of each individual outcome 
of a percolation process on an arbitrary graph defined by 
its adjacency matrix A. 



To support this claim, fig. 1(b) compares the predic- 
tions of eqs. ([2|-([3| with the results of numerical simula- 



tions of bond percolation on the graph shown in fig. 1 (a) 
To lighten the presentation of the results, fig. 1(b) [ shows 
the probability qk of finding a component of size k — re- 
gardless of the identity of the nodes — from a randomly 
chosen node. This quantity is computed using 



Qk 



(4) 



where S{- ■ ■) is the delta function. We observe an excel- 
lent agreement between our theoretical predictions and 
the numerical results. Although for such a small net- 
work no precise percolation threshold can be defined, it 
is however clear that a qualitative change toward a "gi- 
ant component" is initiated for T ~ 0.5-0.6. Also, the 
irregular shape of the distribution for some values of T 
highlights how Qi{l\n) can depend on the precise struc- 
ture of the graph. This advocates for the importance of 
developing methods that consider explicitly the structure 
of the graphs (i.e., the adjacency matrix). 

Since eqs. ([l])-([3]) consider every possible outcomes of 
the percolation process, their predictions are exact. How- 
ever, the calculational burden (e.g., required memory, 
number of operations) increases very quickly with the 
number of node types M. In the case of arbitrary graphs, 
it grows exponentially with the number of nodes. Thus, 
although eqs. ([l])-([3]) are in principle valid for graphs of 
any size, their use becomes cumbersome for large graphs. 
With our present computer facilities, a straightforward 
implementation of eqs. ([2])-(|3]) have been able to handle 
graphs of size of the order of 25. A wiser implementa- 
tion could certainly push this limit somewhat further. 



When dealing with a given graph, specific features of its 
structure may however be used to reduce the numerical 
effort. For instance, the distributions for different mod- 
ules could be solved separately, and then recombined to 
obtain Qi{l\n) for the whole graph. Quantum computa- 
tion of the sort described in [21] may also be a solution 
for larger graphs. 

Despite these limitations, our method compares favor- 
ably with an exact enumeration method where a com- 
puter program explicitly considers each possible edge 
configuration, and then computes the component size dis- 
tribution from them. Firstly, the computational demands 
of this approach scales exponentially with the number of 
edges L, whereas our approach scales exponentially with 
the number of nodes. The performance of our method 
should therefore be comparable to direct enumeration for 
sparse graphs, and should rapidly surpass it for denser 
graphs. Secondly, our method yields analytical solutions 
(i.e., polynomial in pj^.) valid for any value of Pjk- 

Equations ([2])-([3]) can also be used to compute the 
bond percolation threshold of infinite periodic lattices. 
By virtue of the triangle-triangle transformation [25j . the 
percolation threshold is the root of a polynomial related 
to the connectivity of the basic cell of the lattice, which 
is a combination of the coefficients of Qi{l\n). Thus our 
approach offers a systematic and exact way to compute 
this polynomial for complicated basic cells. The Ap- 
pendix provides an example. Furthermore, our approach 
offers a systematic way to obtain the renormalisation- 
group transformation to estimate the scaling exponents 
and the bond percolation threshold of infinite lattices (see 
for instance eq. (3.4) in [26]). 

Finally, eqs. ^ and ^ can be combined to compute 
Qi{l\n) for graphs where nodes of different types inter- 
act through an arbitrary configuration of edges (see [19] 
for an explicit example). This allows to generate a wide 
range of realistic subgraphs (or motifs) found for instance 
in social networks, and to include them in motif-based 
bond percolation models [HI [20] • As the component size 
distribution is closely related to the outbreak size dis- 
tribution, our approach allows to study the spread of 
infectious diseases in more realistic urban settings [27]. 

Predicting node partition distribution. — We can also 
use eqs. ([l])-([3]) to calculate the distribution of the num- 
ber of components (and their size) found in an undi- 
rected graph after the removal of a fraction of its edges. 
We restrict ourselves to undirected graphs because only 
in undirected graphs are components uniquely defined. 
That is, two nodes will be found in the same compo- 
nent with one unique probability regardless of the start- 
ing node. We illustrate how to perform the calculation 
using the Qn.p model. It should nevertheless be clear 
that equations for undirected multitype random graphs 
{pij — pji for all i and j) and undirected arbitrary graphs 
(A = A^) can be derived in a similar manner. 

Let us calculate the probability for a random graph 
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FIG. 2. (color online) Probability of finding each node par- 
tition in Qn,p with n = 12 and for various values of p. Lines 
were obtained using eq. Q, and symbols were obtained by 
performing over 5 x 10* simulations. The |Pi2| = 77 integer 
partitions of 12 are displayed in an increasing order of the 
number of components, i.e., from the node partition where 
there is only one component (noted {12}) on the left to the 
case where there are 12 components of one single node (noted 
{1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1}) on the right. Linebreaks and 
vertical grey lines indicate where the number of components 
changes. Node partitions with the same number of compo- 
nents are displayed in an decreasing order of the largest com- 
ponents sizes (e.g., {8,2,2} before {7,4,1} before {7,3,2}). 



composed of n nodes to be split into k components of 
size r = (ri, r2, r^) with "^.^ri — n. Each com- 
ponent I will be connected, and therefore be a compo- 
nent, with probabiHty Q{ri\ri). We have dropped the 
subscript in Q{ri\ri) because the probability to find one 
single connected component in an undirected graph is 
the same regardless of the starting node. These com- 
ponents will be isolated from one another if none of the 
possible edges between nodes of different components ex- 
ist. For the whole graph, this happens with probability 
]^^^^ (1 — pY'^^ where p is the probability for an edge to 
exist between any two nodes. 

The remaining step is to count the number of ways 
the n nodes can be divided into r components, which 
we note Sr- This number is in fact equal to the number 
of ways to put n labeled objects into k nonempty and 
unlabeled containers of size ri, r2, . . . , and rk- First, let 
us point out that the number of ways to put n labeled 
objects into k labeled containers whose sizes are given 
by r is simply the multinomial coefficient fiVTli'^j'- "^'^ 
obtain Sr, we just remove the redundant configurations 
due to containers of the same size. Hence, noting dm = 
S{m — ri) the number of containers of size to, we get 



(n™rf™!)(n. 



(5) 



Note that Sr is related to the Stirling number of the sec- 
ond kind { 5c } [28 giving the number of ways to put n 



labeled objects into k nonempty and unlabeled contain- 
ers. Indeed { ^ } is simply the sum of every Sr such that 
r has k elements 



Uf XI Sr'5(dim(r) - fc) 



(6) 



where Vn is the set of integer partition of n, i.e., the set 
of decompositions of n into a unordered sum of integers 

Combining these three contributions, we obtain the 
probability for a random graph composed of n nodes to 
be split into k components of size r to be 



P{r)^Sr\[Q{ri\ri)\{{l-pY'^' 



(7) 



To validate eq. ([t]), fig. [2] compares its predictions with 
the results obtained from numerical simulations of the 
Qn.p model for n = 12 and for various values of p. Again, 
an excellent agreement between our theoretical predic- 
tions and the results of the numerical simulations is ob- 
served. Figure [2] highlights the emergence of a "giant" 
component — which occurs when (n — l)p = 1 in the 
limit n — > oo — as the distribution migrates toward par- 
titions with fewer components with increasing p. It also 
shows that, for a same number of components, the parti- 
tions with larger components are more likely to occur in 
general. Although partly shown in fig.[2j this trend holds 
for all values of p G [0, 1], and is due to the simple fact 
that there are more ways (i.e., possible configuration of 
edges) to build large connected components than small 
ones. 

Our method can also serve as a null model for fragmen- 
tation processes. In fact, P{r) is the probability for n el- 
ements to be distributed among r fragments when bonds 
occur (or resist) randomly with probability p. Physical 
correlations can therefore be highlighted by comparing 
P{r) with experimental data. Similar results for Qn.m 
|30) . where the number m of edges (or energy) is fixed 
rather than its average value {^p^ have recently been 
used in the context of nuclear multifragmentation [5]. As 
Qn,p and Qn,m can be seen as the "canonical" and "micro- 
canonical" version of Erdos-Renyi random graphs, the 
results for Qn.m in [S] can be reobtained using our equa- 
tions (see the number of connected graphs at the end of 
the second section). Our method therefore emcompasses 
previous results, and fills the gap in contexts where the 
canonical approach is more relevant. 

Conclusion. — We have introduced a set of iterative 
equations that computes the distribution of the size of the 
components in small random or arbitrary graphs. As di- 
rected and multiple edges can naturally be accounted for 
in the equations, our method is suitable for a wide range 
of arbitrary graphs. Because the equations consider sys- 
tematically all possible outcomes of the bond percolation 
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process, their predictions are exact. We have also demon- 
strated that they can be used to calculate the constrained 
distribution of the size of each component (i.e., node par- 
tition) for undirected small graphs. We have illustrated 
how these results find applications in various disciplines 
like graph theory, percolation theory, epidemiology and 
fragmentation theory. We believe that, despite the in- 
creasing unwieldiness of the calculation with the number 
of nodes, our results open the way to the theoretical pre- 
diction of bond percolation on large, but finite, arbitrary 
graphs. 

This work has been supported by the Canadian Insti- 
tutes of Health Research (CIHR), the Natural Sciences 
and Engineering Research Council of Canada (NSERC), 
and Le Fonds de recherche du Quebec - Nature et tech- 
nologies (FRQ-NT). 

Appendix: Explicit calculation of Qi{l\n) and 
application. — We perform an explicit calculation of 
Qi{l\n) to clarify the use of eqs. ([T])-(|2]) and to illustrate 
some of our claims. 

Let us consider a simple graph composed of 3 nodes 
of type 0, 1 and 2. The directed i j edge exists with 
probability pij and there are 6 possible directed edges. 
We note qij = 1 — pij . We take the node of type as the 
starting node without loss of generality as the two other 
distributions can be obtained by permutation. 

The calculation begins with the initial condition 
Qo(l, 0, 0|1, 0, 0) = 1 stating the obvious fact that the 
probability of finding a component I = (1, 0, 0) in a graph 
of size n — (1,0,0) is 1. Equation ^ provides the prob- 
ability of finding the same component but in graphs re- 
spectively of size (1, 1, 0), (1, 0, 1) and (1, 1, 1) 

Qo(l,0,0|l,l,0) -goi 
Qo(l,0,0|l,0,l) =902 
Qo(l,0,0|l, 1,1) =901902 . 

Using eq. ([2]), we then compute the probability to find a 
component of size 2 in the graphs of size (1, 1, 0), (1, 0, 1) 

go(i,i,o|i,i,o) = 1 -Qo(i,o,o|i,i,o) =m 

Qo(l,0,l|l,0,l) = 1 -Qo(l,0,0|l,0,l) =po2 . 

We use once more eq. ([T]) to compute the probability 
of finding the same components but in a graph of size 
n= (1,1,1) 

Qo{l, 1,0|1, 1, 1) =Poi9i2go2 

Qo(l, 0, 1|1, 1, 1) = ^02921901 • 

Finally, the probability of reaching the whole graph of 
size (1, 1, 1) is obtained with eq. ^ 

Qoil, 1, 1|1, 1, 1) = 1 - 901902 - P0l9l2902 - P0292l901 

= P01P02 + P01P12902 + P02P21901 , 



where we have used the identity 

1 = (POl + g0l)(P02 + 902)(Pl2 + qi2){P21 + 92l) 

to obtain a polynomial with positive coefficients. As 
claimed previously, each term in this last polynomial 
can be interpreted as a path leading to the component 
I = (1,1,1), and its coefficient as the number of distinct 
realisations of such a path. 

With these results, we show how to compute the per- 
colation threshold Pc for the infinite triangular lattice 
using the triangle-triangle transformation |25| . Setting 
Pij — p for all i and j in Qo{l, 1, 1|1, 1, 1) with q = 1— p, 
we retrieve the probability for the three nodes to be con- 
nected in some way, that is p'^ + 3p^q. Remember that for 
undirected graphs, the probability of reaching the entire 
graph is independent of the starting node. The triangle- 
triangle transformation then stipulates that Pc is the low- 
est value in [0,1] for which this last probability is equal to 
the probability q'^ that none of the nodes are connected. 
Thus, Pc satisfies 



Pc 



3j5c + 1 - , 



whose only solution in [0,1] is pc ~ 2 sin (tt/IS), which is 
the exact value of the bond percolation threshold for the 
triangular lattice [31] . 

This simple example could have been solved without 
using eqs. ([I])-([2]). However, repeating this exercise for 
graphs of 4, 5 or 6 nodes should convince the reader 
that a systematic procedure, as provided by eqs. ([T])-(|2|, 
quickly becomes necessary. 
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